Page 1 of 7

2023 Conference article Open Access

Leveraging inter-rater agreement for classification in the presence of noisy labels
Bucarelli M. S., Cassano L., Siciliano F., Mantrach A., Silvestri F.
In practical settings, classification datasets are obtained through a labelling process that is usually done by humans. Labels can be noisy as they are obtained by aggregating the different individual labels assigned to the same sample by multiple, and possibly disagreeing, annotators. The interrater agreement on these datasets can be measured while the underlying noise distribution to which the labels are subject is assumed to be unknown. In this work, we: (i) show how to leverage the inter-annotator statistics to estimate the noise distribution to which labels are subject; (ii) introduce methods that use the estimate of the noise distribution to learn from the noisy dataset; and (iii) establish generalization bounds in the empirical risk minimization framework that depend on the estimated quantities. We conclude the paper by providing experiments that illustrate our findings.Source: CVPR - 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 3439–3448, Vancouver, CANADA, 17-24/06/2023
DOI: 10.1109/cvpr52729.2023.00335
Project(s): SoBigData-PlusPlus via OpenAIRE

Metrics:

See at: ISTI Repository Open Access | ieeexplore.ieee.org Restricted | CNR ExploRA

2017 Journal article Open Access

Tour recommendation for groups
Anagnostopoulos A., Atassi R., Becchetti L., Fazzone A., Silvestri F.
Consider a group of people who are visiting a major touristic city, such as NY, Paris, or Rome. It is reasonable to assume that each member of the group has his or her own interests or preferences about places to visit, which in general may differ from those of other members. Still, people almost always want to hang out together and so the following question naturally arises: What is the best tour that the group could perform together in the city? This problem underpins several challenges, ranging from understanding people's expected attitudes towards potential points of interest, to modeling and providing good and viable solutions. Formulating this problem is challenging because of multiple competing objectives. For example, making the entire group as happy as possible in general conflicts with the objective that no member becomes disappointed. In this paper, we address the algorithmic implications of the above problem, by providing various formulations that take into account the overall group as well as the individual satisfaction and the length of the tour. We then study the computational complexity of these formulations, we provide effective and efficient practical algorithms, and, finally, we evaluate them on datasets constructed from real city data.Source: Data mining and knowledge discovery 31 (2017): 1157–1188. doi:10.1007/s10618-016-0477-7
DOI: 10.1007/s10618-016-0477-7
Project(s): MULTIPLEX via OpenAIRE

Metrics:

See at: Archivio della ricerca- Università di Roma La Sapienza Open Access | ISTI Repository | Data Mining and Knowledge Discovery Restricted | link.springer.com | CNR ExploRA

2016 Conference article Open Access

On the behaviour of deviant communities in online social networks
Coletto M., Aiello L. M., Lucchese C., Silvestri F.
On-line social networks are complex ensembles of inter-linked communities that interact on different topics. Some communities are characterized by what are usually referred to as deviant behaviours, conducts that are commonly considered inappropriate with respect to the society's norms or moral standards. Eating disorders, drug use, and adult content consumption are just a few examples. We refer to such communities as deviant networks. It is commonly believed that such deviant networks are niche, isolated social groups, whose activity is well separated from the mainstream social media life. According to this assumption, research studies have mostly considered them in isolation. In this work we focused on adult content consumption networks, which are present in many on-line social media and in the Web in general. We found that few small and densely connected communities are responsible for most of the content production. Differently from previous work, we studied how such communities interact with the whole social network. We found that the produced content flows to the rest of the network mostly directly or through bridge-communities, reaching at least 450 times more users.We also show that a large fraction of the users can be inadvertently exposed to such content through indirect content resharing. We also discuss a demographic analysis of the producers and consumers networks. Finally, we show that it is easily possible to identify a few core users to radically uproot the diffusion process. We aim at setting the basis to study deviant communities in context.Source: ICWSM 2016 - Tenth International AAAI Conference on Web and Social Media, pp. 72, Cologne, Germany, 17-20 May 2016

See at: www.aaai.org Open Access | CNR ExploRA

2014 Contribution to book Restricted

Recommender systems
Lucchese C., Muntean C. I., Perego R., Silvestri F., Vahabih., Venturini R.
An abstract is not availableSource: Mining User Generated Content, edited by Marie-Francine Moens, Juanzi Li, Tat-Seng Chua, pp. 287. London: Chapman and Hall, 2014

See at: www.crcpress.com Restricted | CNR ExploRA

2014 Contribution to book Restricted

Effective Data Access Patterns on Massively Parallel Processors
Capannini G., Baraglia R., Silvestri F., Nardini F. M.
The new generation of microprocessors incorporates a huge number of cores on the same chip. This trades single-core performance off for the total amount of work done across multiple threads of execution. Graphics Processing Units (GPUs) are an example of this kind of architectures. The first generation of GPUs has been designed to support a fixed set of rendering functions. Nowa- days, GPUs are becoming easier to program. Therefore, they can be used for applications that have been traditionally handled by CPUs. The reasons of using General Purpose GPU (GPGPUs) in high-performance computations are: raw computing power, good performance per watt, and low costs. How- ever, some important issues limit a wide exploitation of GPGPUs. The main one concerns the heterogeneous and distributed nature of the memory hierar- chy. As a consequence, the speed-up of some applications depends on being able to efficiently access the data so that all cores are able to work at the same time. This chapter discusses the characteristics and the issues of the memory systems of this kind of architectures. We analyze these architectures from a theoretical point by using K-model, a model for capturing their performance constraints. K -model is used to estimate the complexity of a given algorithm defined on this model. This chapter describes how K-model can also be used to design efficient data access patterns for implementing efficient GPU algorithms. To this extent, we use K -model to derive an efficient realization of two popular algorithms, i.e., prefix sum and sorting. By means of reproducible experiments, we validate theoretical results showing that the optimization of an algorithm based on K-model corresponds to an actual optimization in practice.Source: High-Performance Computing on Complex Environments, edited by Emmanuel Jeannot, Julius Zilinskas, pp. 115–134. Hoboken: John Wiley & Sons Inc., 2014
DOI: 10.1002/9781118711897.ch7
Metrics:

See at: doi.org Restricted | www.scopus.com | CNR ExploRA

2013 Journal article Restricted

Endorsements and rebuttals in blog distillation
Berardi G., Esuli A., Sebastiani F., Silvestri F.
In this paper we test a new approach to blog distillation, defined as the task in which, given a user query, the system ranks the blogs in descending order of relevance to the query topic. Our approach is based on the idea of adding a link analysis phase to the standard retrieval-by-topicality phase. However, differently from other link analysis methods, we check whether a given hyperlink is a citation with a positive or a negative nature, i.e., if it expresses approval or disapproval of the hyperlinked page by the hyperlinking page. This allows us to test the hypothesis that distinguishing approval from disapproval brings about benefits in the blog distillation task. We have tested our method on the Blogs08 collection used in the last two editions (2009 and 2010) of the TREC Blog Track, a collection consisting of more than one million blogs and more than 28 million blog posts. Unfortunately, the experimental results seem to disconfirm the above hypothesis, due to the low level of connectivity of the collection which severely limits the impact of a link analysis phase (and, a fortiori, of the attempt to distinguish endorsements from rebuttals). Application contexts other than the blogosphere (such as, e.g., the domain of eBay transactions) are probably more suited to such an approach.Source: Information sciences 249 (2013): 47. doi:10.1016/j.ins.2013.05.037
DOI: 10.1016/j.ins.2013.05.037
Metrics:

See at: Information Sciences Restricted | www.sciencedirect.com | CNR ExploRA

2013 Contribution to conference Open Access

Learning to shorten query sessions.
Muntean C., Nardini F. M., Silvestri F., Sydow M.
We propose the use of learning to rank techniques to shorten query sessions by maximizing the probability that the query we predict is the final query of the current search session. We present a preliminary evaluation showing that this approach is a promising research direction.Source: WWW '13 - 22nd international Conference on World Wide Web Companion, pp. 131–132, Rio de Janeiro, Brasil, 13-17 Maggio 2013

See at: dl.acm.org Open Access | CNR ExploRA

2013 Contribution to conference Unknown

Towards leveraging closed captions for news retrieval
Blanco R., De Francisci Morales G., Silvestri F.
IntoNow from Yahoo! is a second screen application that enhances the way of watching TV programs. The application uses audio from the TV set to recognize the program being watched, and provides several services for different use cases. For instance, while watching a football game on TV it can show statistics about the teams playing, or show the title of the song performed by a contestant in a talent show. The additional content provided by IntoNow is a mix of editorially curated and automatically selected one. From a research perspective, one of the most interesting and challenging use cases addressed by IntoNow is related to news programs (newscasts). When a user is watching a newscast, IntoNow detects it and starts showing online news articles from the Web. This work presents a preliminary study of this problem, i.e., to find an online news article that matches the piece of news discussed in the newscast currently airing on TV, and display it in real-time.Source: 22nd international conference on World Wide Web Companion, pp. 135–136, Rio de Janeiro, Brasil, 13-17 Maggio 2013

See at: CNR ExploRA

2013 Contribution to journal Restricted

Editorial - Journal of Discrete Algorithms
Grossi R., Sebastiani F., Silvestri F.
Source: Journal of discrete algorithms (Print) 18 (2013): 1–2. doi:10.1016/j.jda.2012.12.008
DOI: 10.1016/j.jda.2012.12.008
Metrics:

See at: www.sciencedirect.com Restricted | CNR ExploRA

2013 Conference article Open Access

Load-sensitive selective pruning for distributed search
Broccolo D., Macdonal, C., Orlando S., Ounis I., Perego R., Silvestri F., Tonellotto, N.
Sommario in IngleseA search engine infrastructure must be able to provide the same quality of service to all queries received during a day. During normal operating conditions, the demand for resources is considerably lower than under peak conditions, yet an oversized infrastructure would result in an unnecessary waste of computing power. A possible solution adopted in this situation might consist of dening a maximum threshold processing time for each query, and dropping queries for which this threshold elapses, leading to disappointed users. In this paper, we propose and evaluate a dierent approach, where, given a set of dierent query processing strategies with diering eciency, each query is considered by a framework that sets a maximum query processing time and selects which processing strategy is the best for that query, such that the processing time for all queries is kept below the threshold. The processing time estimates used by the scheduler are learned from past queries. We experimentally validate our approach on 10,000 queries from a standard TREC dataset with over 50 million documents, and we compare it with several baselines. These experiments encompass testing the system under dierent query loads and dierent maximum tolerated query response times. Our results show that, at the cost of a marginal loss in terms of response quality, our search system is able to answer 90% of queries within half a second during times of high query volume.Source: CIKM'13 - 22nd ACM International Conference on information & Knowledge Management, pp. 379–388, San Francisco, USA, October 27 -1 November 2013
DOI: 10.1145/2505515.2505699
Project(s): MIDAS via OpenAIRE

Metrics:

See at: Enlighten Open Access | www.dcs.gla.ac.uk | dl.acm.org Restricted | doi.org | CNR ExploRA

2013 Conference article Open Access

Query processing in highly-loaded search engines
Broccolo D., Macdonald C., Orlando S., Ounis I., Perego R., Silvestri F., Tonellotto N.
While Web search engines are built to cope with a large number of queries, query traffic can exceed the maximum query rate supported by the underlying computing infrastructure. We study how response times and results vary when, in presence of high loads, some queries are either interrupted after a fixed time threshold elapses or dropped completely. Moreover, we introduce a novel dropping strategy, based on machine learned performance predictors to select the queries to drop in order to sustain the largest possible query rate with a relative degradation in effectiveness.Source: SPIRE'13 - String Processing and Information Retrieval. 20th International Symposium, pp. 49–55, Jerusalem, 7-9 October 2013
DOI: 10.1007/978-3-319-02432-5_9
Metrics:

See at: www.dcs.gla.ac.uk Open Access | doi.org Restricted | link.springer.com | CNR ExploRA

2013 Journal article Open Access

Discovering Tasks from Search Engine Query Logs
Lucchese C., Orlando S., Perego R., Silvestri F., Tolomei G
Although Web search engines still answer user queries with lists of ten blue links to webpages, people are increasingly issuing queries to accomplish their daily tasks (e. g., finding a recipe, booking a flight, reading online news, etc.). In this work, we propose a two-step methodology for discovering tasks that users try to perform through search engines. First, we identify user tasks from individual user sessions stored in search engine query logs. In our vision, a user task is a set of possibly noncontiguous queries (within a user search session), which refer to the same need. Second, we discover collective tasks by aggregating similar user tasks, possibly performed by distinct users. To discover user tasks, we propose query similarity functions based on unsupervised and supervised learning approaches. We present a set of query clustering methods that exploit these functions in order to detect user tasks. All the proposed solutions were evaluated on a manually-built ground truth, and two of them performed better than state-of-the-art approaches. To detect collective tasks, we propose four methods that cluster previously discovered user tasks, which in turn are represented by the bag-of-words extracted from their composing queries. These solutions were also evaluated on another manually-built ground truth.Source: ACM transactions on information systems 31 (2013): 1–43. doi:10.1145/2493175.2493179
DOI: 10.1145/2493175.2493179
Project(s): MIDAS via OpenAIRE

Metrics:

See at: ACM Transactions on Information Systems Open Access | dl.acm.org Restricted | ACM Transactions on Information Systems | CNR ExploRA

2013 Conference article Restricted

Modeling and predicting the task-by-task behavior of search engine users
Lucchese C., Orlando S., Perego R., Tolomei G., Silvestri F.
Web search engines answer user needs on a query-by-query fashion, namely they retrieve the set of the most relevant results to each issued query, independently. However, users often submit queries to perform multiple, related tasks. In this paper, we first discuss a methodology to discover from query logs the latent tasks performed by users. Furthermore, we introduce the Task Relation Graph (TRG) as a representation of users' search behaviors on a task-by-task perspective. The task-by-task behavior is captured by weighting the edges of TRG with a relatedness score computed between pairs of tasks, as mined from the query log. We validate our approach on a concrete application, namely a task recommender system, which suggests related tasks to users on the basis of the task predictions derived from the TRG. Finally, we show that the task recommendations generated by our solution are beyond the reach of existing query suggestion schemes, and that our method recommends tasks that user will likely perform in the near future.Source: OAIR 2013 - 10th Conference on Open Research Areas in Information Retrieval, pp. 77–84, Lisbon, Portugal, 15-17 May 2013

See at: dl.acm.org Restricted | CNR ExploRA

2012 Journal article Restricted

Sorting on GPUs for large scale datasets: a thorough comparison
Capannini G., Silvestri F. : Baraglia R.
Although sort has been extensively studied in many research works, it still remains a challenge in particular if we consider the implications of novel processor technologies such as manycores (i.e. GPUs, Cell/BE, multicore, etc.). In this paper, we compare different algorithms for sorting integers on stream multiprocessors and we discuss their viability on large datasets (such as those managed by search engines). In order to fully exploit the potentiality of the underlying architecture, we designed an optimized version of sorting network in the K-model, a novel computational model designed to consider all the important features of many-core architectures. According to K-model, our bitonic sorting network mapping improves the three main aspects of many-core architectures, i.e. the processors exploitation, and the on-chip/off-chip memory bandwidth utilization. Furthermore we are able to attain a space complexity of O(1). We experimentally compare our solution with state-of-the-art ones (namely, quick-sort and radix-sort) on GPUs. We also compute the complexity in the K-model for such algorithms. The conducted evaluation highlight that our bitonic sorting network is faster than quick-sort and slightly slower than radix, yet being an in-place solution it consumes less memory than both algorithms.Source: Information processing & management 48 (2012): 903–917. doi:10.1016/j.ipm.2010.11.010
DOI: 10.1016/j.ipm.2010.11.010
Metrics:

See at: Information Processing & Management Restricted | www.sciencedirect.com | CNR ExploRA

2012 Conference article Restricted

RecTour: a recommender system for tourists
Baraglia R., Frattari C., Muntean C. I., Nardini F. M., Silvestri F.
This paper presents a recommender system that provides personalized information about locations of potential interest to a tourist. The system generates suggestions, consisting of touristic places, according to the current position and history data describing the tourist movements. For the selection of tourist sites, the system uses a set of points of interest a priori identified. We evaluate our system on two datasets: a real and a synthetic one, both storing trajectories describing previous movements of tourists. The proposed solution has high applicability and the results show that the solution is both efficient and viable.Source: International Workshop on Tourism Facilities, Macau, China, 4 December 2012
DOI: 10.1109/wi-iat.2012.88
Metrics:

See at: doi.org Restricted | ieeexplore.ieee.org | CNR ExploRA

2012 Contribution to book Restricted

Mining lifecycle event logs for enhancing service-based applications
Dustdar S., Leitner P., Nardini F. M., Silvestri F., Tolomei G.
Service-Oriented Architectures (SOAs), and traditional enterprise systems in general, record a variety of events (e.g., messages being sent and received between service components) to proper log files, i.e., event logs. These files constitute a huge and valuable source of knowledge that may be extracted through data mining techniques. To this end, process mining is increasingly gaining interest across the SOA community. The goal of process mining is to build models without a priori knowledge, i.e., to discover structured process models derived from specific patterns that are present in actual traces of service executions recorded in event logs. However, in this work, the authors focus on detecting frequent sequential patterns, thus considering process mining as a specific instance of the more general sequential pattern mining problem. Furthermore, they apply two sequential pattern mining algorithms to a real event log provided by the Vienna Runtime Environment for Service-oriented Computing, i.e., VRESCo. The obtained results show that the authors are able to find services that are frequently invoked together within the same sequence. Such knowledge could be useful at design-time, when service-based application developers could be provided with service recommendation tools that are able to predict and thus to suggest next services that should be included in the current service composition.Source: Adaptive Web Services for Modular and Reusable Software Development: Tactics and Solutions, edited by Guadalupe Ortiz, Javier Cubo, pp. 196–206. Hershey: Information Science Reference, 2012
DOI: 10.4018/978-1-4666-2089-6.ch007
DOI: 10.4018/978-1-4666-2455-9.ch033
Project(s): S-CUBE via OpenAIRE

Metrics:

See at: doi.org Restricted | doi.org | www.igi-global.com | CNR ExploRA

2012 Journal article Open Access

Cite-as-you-write
Jack K., Sambati M., Silvestri F., Trani S., Venturini R.
Engines and dedicated social networks are generally used to search for relevant literature. Current technologies rely on keyword based searches which, however, do not provide the support of a wider context. Cite-as-you-write aims to simplify and shorten this exploratory task: given a verbose description of the problem to be investigated, the system automatically recommends related papers/citations.Source: ERCIM news 90 (2012).
Project(s): ADVANCE via OpenAIRE

See at: ercim-news.ercim.eu Open Access | CNR ExploRA

2012 Conference article Open Access

Blog distillation via sentiment-sensitive link analysis.
Berardi G., Esuli A., Sebastiani F., Silvestri F.
In this paper we approach blog distillation by adding a link analysis phase to the standard retrieval-by-topicality phase, where we also we check whether a given hyperlink is a citation with a positive or a negative nature. This allows us to test the hypothesis that distinguishing approval from disapproval brings about benefits in blog distillation.Source: Natural Language Processing and Information Systems. 17th International Conference on Applications of Natural Language to Information Systems, pp. 228–233, Groningen, The Netherlands, 26-28 June 2012
DOI: 10.1007/978-3-642-31178-9_26
Metrics:

See at: nmis.isti.cnr.it Open Access | doi.org Restricted | www.scopus.com | www.springerlink.com | CNR ExploRA

2012 Conference article Restricted

You should read this! Let me explain you why! - Explaining news recommendations to users.
Blanco R., Ceccarelli D., Lucchese C., Perego R., Silvestri F.
Recommender systems have become ubiquitous in content- based web applications, from news to shopping sites. None- theless, an aspect that has been largely overlooked so far in the recommender system literature is that of automati- cally building explanations for a particular recommendation. This paper focuses on the news domain, and proposes to en- hance effectiveness of news recommender systems by adding, to each recommendation, an explanatory statement to help the user to better understand if, and why, the item can be her interest. We consider the news recommender system as a black-box, and generate different types of explanations em- ploying pieces of information associated with the news. In particular, we engineer text-based, entity-based, and usage- based explanations, and make use of a Markov Logic Net- works to rank the explanations on the basis of their effec- tiveness. The assessment of the model is conducted via a user study on a dataset of news read consecutively by actual users. Experiments show that news recommender systems can greatly benefit from our explanation module.Source: 21st ACM International conference on Information and knowledge management, pp. 1995–1999, Maui, Hawaii, 29 October - 2 November 2012
DOI: 10.1145/2396761.2398559
Metrics:

See at: dl.acm.org Restricted | doi.org | CNR ExploRA

2012 Conference article Open Access

Interactive and context-aware tag spell check and correction.
Bonchi F., Frieder O., Nardini F. M., Silvestri F., Vahabi H.
Collaborative content creation and annotation creates vast repositories of all sorts of media, and user-defined tags play a central role as they are a simple yet powerful tool for organizing, searching and exploring the available resources. We observe that when a user annotates a resource with a set of tags, those tags are introduced one at a time. Therefore, when the fourth tag is introduced, a knowledge represented by the previous three tags, i.e., the context in which the fourth tag is produced, is available and exploitable for generating potential correction of the current tag. This context, together with the "wisdom of the crowd" represented by the co-occurrences of tags in all the resources of the repository, can be exploited to provide interactive tag spell check and correction. We develop this idea in a framework, based on a weighted tag co-occurrence graph and on nodes relatedness measures defined on weighted neighborhoods. We test our proposal on a dataset coming from YouTube. The results show that our framework is effective as it outperforms two important baselines. We also show that it is efficient, thus enabling its use in modern tagging services.Source: 21st ACM International conference on Information and knowledge management, pp. 1869–1873, Maui, Hawaii, 29 October - 2 November 2012
DOI: 10.1145/2396761.2398534
Metrics:

See at: hpc.isti.cnr.it Open Access | dl.acm.org Restricted | doi.org | CNR ExploRA